Introduction

With the growing presence of technology in our society, there is an rapidly increasing demand for hardware which supports our heavy computational demands. One of the most important pieces of computer hardware for computationally-intensive tasks is Graphics Processing Units (GPU) due to their ability to handle a wide range of parallel processing tasks — this has made them an invaluable resource for companies pursing any sort of Artificial Intelligence (AI), super-computing, crypto-currencies, or computer graphics. Consequently, companies which manufacture processing chips have been a focal point for investors in recent years, as they directly control the output capabilities for a large portion of the technology sector.

The purpose of this project is to attempt to predict the price-trends of a semiconductor stock - in fact, there are actually four stocks we wish to predict: Nvidia, Advanced Micro Devices (AMD), Intel Corporation, and Taiwan Semiconductor Manufacturing. Though tackling four separate stocks slightly deviates from the project guideline of applying a variety of machine learning algorithms to a single data-set, the hope is that this will give a better prediction of which machine learning models are most useful for stock market prediction (in the case of semiconductor stocks). Put differently, if we only applied our statistical learning models to a single stock resulting in an optimal model \(M\), this would naturally raise the question of whether \(M\) fits the stock best because it is optimal for stock market predictions, or if it fits the stock best due to the stock’s characteristics. Therefore, for each of the four chip manufacturers listed above, we will apply a variety of statistical learning models , ranging from standard regression to more non-linear models like random forest learning and k-Nearest neighbors.

Loading Packages and Data

knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)
library(tidymodels)
library(ggplot2)
library(corrplot)
library(discrim)
library(ggthemes)
library(kableExtra)
library(yardstick)
library(visdat)
library(scales)
library(glmnet)
library(quantmod)
library(dygraphs)
library(tidyr)


tidymodels_prefer()
conflicted::conflicts_prefer(yardstick::rsq)
conflicted::conflicts_prefer(dplyr::lag)
set.seed(3435)

One of the most useful packages for the analysis of stock market data in R is the quantmod package. While there are several built-in functions and tools which predict stock market trends using statistical learning, we will simply use this package for the purpose of pulling live stock data from Yahoo Finance and visualizing the data in a much neater fashion than usual ggplot2 (in addition, using built-in statistical learning methods in a statistical learning project is somewhat poor form).

In particular, the quantmod package allows us to circumvent the process of downloading a CSV file of stock market data from the internet and read it into R by using the loadSymbols method, which then automatically loads all stock data (over a potentially specified interval) into a table assigned the same name as the stock’s symbol. Since the function loadSymbols will automatically set the most recent day to pull stock data as the current day, we must specify a “last day” to pull our stock data to ensure that our models run the same every day; thus, we will pull one fiscal year’s worth of stock data ranging over all of 2023. It should also be noted that the table loaded by loadSymbols is not a data-frame; its original datatype must be used to aesthetically plot all the stock information via chartSeries, but we will later need to typecast this object to a usual data-frame in order to apply most of the tidyverse functions.

Below are formatted plots of the market data pulled for our four stocks: AMD, NVDA, INTC, and TSM. We will postpone the discussion of what several variables mean until the Exploratory Data Analysis section below.

a<-loadSymbols("AMD", from="2023-01-01", to="2024-01-01")
chartSeries(AMD,TA=c(addVo(),addBBands(),addMACD()))

my_dates <- index(AMD)
AMD <- data.frame(AMD)
AMD$AMD.Date <- my_dates
a<-loadSymbols("NVDA", from="2023-01-01", to="2024-01-01")
chartSeries(NVDA,TA=c(addVo(),addBBands(),addMACD()))

NVDA <- data.frame(NVDA)
NVDA$NVDA.Date <- my_dates
a<-loadSymbols("INTC", from="2023-01-01", to="2024-01-01")
chartSeries(INTC,TA=c(addVo(),addBBands(),addMACD()))

INTC <- data.frame(INTC)
INTC$INTC.Date <- my_dates
a<-loadSymbols("TSM", from="2023-01-01", to="2024-01-01")
chartSeries(TSM,TA=c(addVo(),addBBands(),addMACD()))

TSM <- data.frame(TSM)
TSM$TSM.Date <- my_dates

With each stock’s data loaded and typecast into a data-frame, we also want to examine the total number of predictors and observations for each individual stock. As the beginning and ending dates specified in loadSymbols are the same across our four stocks, we need only examine one of our four symbols since all predictors are the same:

dim(AMD)
## [1] 250   7

One may be initially misled by the fact that only 250 observations appear across a years worth of data; i.e. we only have 250 out of 365 days. However, this is simply a reflection of the fact that the New York Stock Exchange does not operate on weekends and a hanful of holidays. In addition, a quick analysis shows that there is no missing data among any of the four stocks; however, this is somewhat expected since stock market data is meant to be as publicly available as possible and the original features are fairly common metrics for financial institutions to collect.

vis_miss(AMD)
vis_miss(NVDA)
vis_miss(INTC)
vis_miss(TSM)

Exploratory Data Analysis

With our four stocks pulled and loaded into data frames, we first wish to examine the relevance of each predictor initially provided and then examine other possible metrics to measure the performance of a given stock by. As measuring the performance of a financial security over time relies on not only historical data, but historical data whose relevancy may depreciate depending on how recent it is (or isn’t), much of the rationale behind our added predictors in the latter half of this section revolves around study of time-series in statistics. However, since the focus of this project is on statistical-learning models and how they are applied to certain problems, several important techniques in the study of time-series (and thus stock-market analysis) will not be applied within this project.

Describing the Predictors

In the first half of this section, we first examine the initial predictors loaded into the the data frames for our stocks. For each of the four stock symbols (AMD, NVDA, INTC, and TSM), we only consider the term following the delimiter "." and our stock symbol:

  • Open: The opening price of the stock that day (specifically, the value of the stock at 9:30AM EST).
  • Close: The closing price of the stock that day (specifically, the value of the stock at 4:00PM EST)
  • Low: The minimum valuation of the stock over the given day.
  • High: The maximum valuation of the stock over the given day.
  • Volume: The total number of shares traded on a stock over the given day (i.e. both bought and sold).
  • Adjusted: The closing price of the stock that day together with any adjustments due to corporate actions such as rights offerings, dividends, and splits.

The dygraph package loaded in the section provides a compact and convenient way for us to measure most of the above predictors simultaneously for each stock:

p <- dygraph(AMD[,c(1:4)], xlab = "Date", ylab = "Price", main = "AMD Price") %>%
  dySeries("AMD.Open", label = "Open", color = "black") %>%
  dySeries("AMD.Low", label = "Low", color = "red") %>%
  dySeries("AMD.High", label = "High", color = "green") %>%
  dySeries("AMD.Close", label = "Close", color = "orange") %>%
  dyRangeSelector() %>%
  dyCandlestick()%>%
  dyCrosshair(direction = "vertical") %>%
  dyHighlight(highlightCircleSize = 3, highlightSeriesBackgroundAlpha = 0.2, hideOnMouseOut = T)  %>%
  dyRoller(rollPeriod = 1)
p
p <- dygraph(NVDA[,c(1:4)], xlab = "Date", ylab = "Price", main = "NVDA Price") %>%
  dySeries("NVDA.Open", label = "Open", color = "black") %>%
  dySeries("NVDA.Low", label = "Low", color = "red") %>%
  dySeries("NVDA.High", label = "High", color = "green") %>%
  dySeries("NVDA.Close", label = "Close", color = "orange") %>%
  dyRangeSelector() %>%
  dyCandlestick()%>%
  dyCrosshair(direction = "vertical") %>%
  dyHighlight(highlightCircleSize = 3, highlightSeriesBackgroundAlpha = 0.2, hideOnMouseOut = T)  %>%
  dyRoller(rollPeriod = 1)
p
p <- dygraph(INTC[,c(1:4)], xlab = "Date", ylab = "Price", main = "INTC Price") %>%
  dySeries("INTC.Open", label = "Open", color = "black") %>%
  dySeries("INTC.Low", label = "Low", color = "red") %>%
  dySeries("INTC.High", label = "High", color = "green") %>%
  dySeries("INTC.Close", label = "Close", color = "orange") %>%
  dyRangeSelector() %>%
  dyCandlestick()%>%
  dyCrosshair(direction = "vertical") %>%
  dyHighlight(highlightCircleSize = 3, highlightSeriesBackgroundAlpha = 0.2, hideOnMouseOut = T)  %>%
  dyRoller(rollPeriod = 1)
p
p <- dygraph(TSM[,c(1:4)], xlab = "Date", ylab = "Price", main = "TSM Price") %>%
  dySeries("TSM.Open", label = "Open", color = "black") %>%
  dySeries("TSM.Low", label = "Low", color = "red") %>%
  dySeries("TSM.High", label = "High", color = "green") %>%
  dySeries("TSM.Close", label = "Close", color = "orange") %>%
  dyRangeSelector() %>%
  dyCandlestick()%>%
  dyCrosshair(direction = "vertical") %>%
  dyHighlight(highlightCircleSize = 3, highlightSeriesBackgroundAlpha = 0.2, hideOnMouseOut = T)  %>%
  dyRoller(rollPeriod = 1)
p

Added Predictors and Metrics

While the six predictors pulled from Yahoo Finance give significant insight into each stock’s historical performance over the year, there may be other, more useful metrics that we can use to assess and predict the future growth of our securities. For example, since all additional predictors introduced rely on the analysis of recent data, the most important concept to first introduct is that of a lag variable.

Lag Values

Given a time series problem specified by events and observations \(\{(t_1, y_1), \dots, (t_n, y_n) \}\), the value of \(y_n\) in some way depends on the values of \(y_{n-1}, \dots, y_1\).

AMD$AMD.Close_L1 <- lag(AMD$AMD.Close, 1)
AMD$AMD.Close_L2 <- lag(AMD$AMD.Close, 2)
AMD$AMD.Close_L3 <- lag(AMD$AMD.Close, 3)
AMD$AMD.Close_L4 <- lag(AMD$AMD.Close, 4)
AMD$AMD.Close_L5 <- lag(AMD$AMD.Close, 5)
AMD$AMD.Close_L6 <- lag(AMD$AMD.Close, 6)
AMD$AMD.Close_L7 <- lag(AMD$AMD.Close, 7)
AMD$AMD.Close_L8 <- lag(AMD$AMD.Close, 8)
AMD$AMD.Close_L9 <- lag(AMD$AMD.Close, 9)
AMD$AMD.Close_L10 <- lag(AMD$AMD.Close, 10)



AMD <- AMD %>%
  fill(AMD.Close_L1, .direction = "up") %>%
  fill(AMD.Close_L2, .direction = "up") %>%
  fill(AMD.Close_L3, .direction = "up") %>% 
  fill(AMD.Close_L4, .direction = "up") %>% 
  fill(AMD.Close_L5, .direction = "up") %>% 
  fill(AMD.Close_L6, .direction = "up") %>% 
  fill(AMD.Close_L7, .direction = "up") %>% 
  fill(AMD.Close_L8, .direction = "up") %>% 
  fill(AMD.Close_L9, .direction = "up") %>% 
  fill(AMD.Close_L10, .direction = "up") 
NVDA$NVDA.Close_L1 <- lag(NVDA$NVDA.Close, 1)
NVDA$NVDA.Close_L2 <- lag(NVDA$NVDA.Close, 2)
NVDA$NVDA.Close_L3 <- lag(NVDA$NVDA.Close, 3)
NVDA$NVDA.Close_L4 <- lag(NVDA$NVDA.Close, 4)
NVDA$NVDA.Close_L5 <- lag(NVDA$NVDA.Close, 5)
NVDA$NVDA.Close_L6 <- lag(NVDA$NVDA.Close, 6)
NVDA$NVDA.Close_L7 <- lag(NVDA$NVDA.Close, 7)
NVDA$NVDA.Close_L8 <- lag(NVDA$NVDA.Close, 8)
NVDA$NVDA.Close_L9 <- lag(NVDA$NVDA.Close, 9)
NVDA$NVDA.Close_L10 <- lag(NVDA$NVDA.Close, 10)



NVDA <- NVDA %>%
  fill(NVDA.Close_L1, .direction = "up") %>%
  fill(NVDA.Close_L2, .direction = "up") %>%
  fill(NVDA.Close_L3, .direction = "up") %>% 
  fill(NVDA.Close_L4, .direction = "up") %>% 
  fill(NVDA.Close_L5, .direction = "up") %>% 
  fill(NVDA.Close_L6, .direction = "up") %>% 
  fill(NVDA.Close_L7, .direction = "up") %>% 
  fill(NVDA.Close_L8, .direction = "up") %>% 
  fill(NVDA.Close_L9, .direction = "up") %>% 
  fill(NVDA.Close_L10, .direction = "up")
INTC$INTC.Close_L1 <- lag(INTC$INTC.Close, 1)
INTC$INTC.Close_L2 <- lag(INTC$INTC.Close, 2)
INTC$INTC.Close_L3 <- lag(INTC$INTC.Close, 3)
INTC$INTC.Close_L4 <- lag(INTC$INTC.Close, 4)
INTC$INTC.Close_L5 <- lag(INTC$INTC.Close, 5)
INTC$INTC.Close_L6 <- lag(INTC$INTC.Close, 6)
INTC$INTC.Close_L7 <- lag(INTC$INTC.Close, 7)
INTC$INTC.Close_L8 <- lag(INTC$INTC.Close, 8)
INTC$INTC.Close_L9 <- lag(INTC$INTC.Close, 9)
INTC$INTC.Close_L10 <- lag(INTC$INTC.Close, 10)



INTC <- INTC %>%
  fill(INTC.Close_L1, .direction = "up") %>%
  fill(INTC.Close_L2, .direction = "up") %>%
  fill(INTC.Close_L3, .direction = "up") %>% 
  fill(INTC.Close_L4, .direction = "up") %>% 
  fill(INTC.Close_L5, .direction = "up") %>% 
  fill(INTC.Close_L6, .direction = "up") %>% 
  fill(INTC.Close_L7, .direction = "up") %>% 
  fill(INTC.Close_L8, .direction = "up") %>% 
  fill(INTC.Close_L9, .direction = "up") %>% 
  fill(INTC.Close_L10, .direction = "up") 
TSM$TSM.Close_L1 <- lag(TSM$TSM.Close, 1)
TSM$TSM.Close_L2 <- lag(TSM$TSM.Close, 2)
TSM$TSM.Close_L3 <- lag(TSM$TSM.Close, 3)
TSM$TSM.Close_L4 <- lag(TSM$TSM.Close, 4)
TSM$TSM.Close_L5 <- lag(TSM$TSM.Close, 5)
TSM$TSM.Close_L6 <- lag(TSM$TSM.Close, 6)
TSM$TSM.Close_L7 <- lag(TSM$TSM.Close, 7)
TSM$TSM.Close_L8 <- lag(TSM$TSM.Close, 8)
TSM$TSM.Close_L9 <- lag(TSM$TSM.Close, 9)
TSM$TSM.Close_L10 <- lag(TSM$TSM.Close, 10)


TSM <- TSM %>%
  fill(TSM.Close_L1, .direction = "up") %>%
  fill(TSM.Close_L2, .direction = "up") %>%
  fill(TSM.Close_L3, .direction = "up") %>% 
  fill(TSM.Close_L4, .direction = "up") %>% 
  fill(TSM.Close_L5, .direction = "up") %>% 
  fill(TSM.Close_L6, .direction = "up") %>% 
  fill(TSM.Close_L7, .direction = "up") %>% 
  fill(TSM.Close_L8, .direction = "up") %>% 
  fill(TSM.Close_L9, .direction = "up") %>% 
  fill(TSM.Close_L10, .direction = "up") 

Simple Moving Average of Closing Price

As consumers use historical stock data to determine whether a certain stock is worth buying or not, it becomes apparent that stocks’ price is, in one way or another, dependent on its previous value. While this is technically true for any continuous function / continuous random variable, it is clear that even long-term data can affect a stock’s current value — for example, if a stock has been in a steady downward trend for quite some time, it will negatively affect the perception of potential investors.

While there are multiple financial metrics which account for previous stock prices, this analysis will only look at two basic measurements: the simple moving average and the simple moving standard deviation (where n is some integer-valued hyper-parameter). Although there are subtle differences between the opening price and the closing price of a stock, the larger the moving value is (i.e. the number of days averaged over) the less the distinction should matter in terms of which variable to average; for consistency, we will simply base our new metrics on the closing costs of each stock.

Additionally, there is no clear choice for how much previous data to account for — should the analysis look back at a single week’s worth of data or a month? As this is itself an interesting question for the sake of tuning our models, we will consider this an added hyperparameter for the problem and consider four possible values: 1 week, 2 weeks, 1 month, and 2 months.

simple_moving_average <- function(my_vec, lag_period) {
  #' Takes the running average of a column vector
  #'
  #' Creates a new column vector whose entries are the average of the previous lag_period entries.
  #' When not enough data is available to take the average over lag_period, the closest possible 
  #' average will be taken (for example, if lag_period = 10, then the first 2nd entry of the output
  #' vector will simply be the average of the first two values, the 3rd entry of the output vector
  #' will be the average of the first three values, and so forth.)
  #'
  #' @param my_vec the column vector to take the average values of
  #' @param lag_period the number of days one wishes to average over
  #' 
  #' @return A vector whose entries represent the average of the previous lag_period entries in my_vec
  
  
  # Error handling
  if(is.vector(my_vec) == FALSE){
    stop("Not Vector: First argument of simple_moving_average must be a vector")
  }
  if(is.numeric(my_vec[1]) == FALSE){
    stop("Non-numeric Entries: values of vector in first argument must be numeric.")
  }
  if(is.numeric(lag_period) == FALSE || lag_period != round(lag_period)){
    stop("Not Integer: Second argument of simple_moving_average must be an integer larger than or equal to 2")
  }
  if(lag_period <= 1){
    stop("Not Large Enough: Second argument of simple_moving_average must be an integer larger than or equal to 2")
  }
 
  # return variable
  output_vec = c()
  for (i in 1:length(my_vec)) {
    # If there are less that lag_period of data previous to the current date,
    # simply take the average of all the days prior to get the closest thing
    # to a running average
    if (i <= lag_period){
      output_vec[i] = mean(my_vec[1:i])
    }
    else {
      output_vec[i] = mean(my_vec[(i-lag_period + 1):i])
    }
  }
  return(output_vec)
}
  
AMD$AMD.SMA_cl_1W <- simple_moving_average(AMD$AMD.Close, 5)
AMD$AMD.SMA_cl_2W <- simple_moving_average(AMD$AMD.Close, 10)
AMD$AMD.SMA_cl_1M <- simple_moving_average(AMD$AMD.Close, 20)
AMD$AMD.SMA_cl_2M <- simple_moving_average(AMD$AMD.Close, 40)


NVDA$NVDA.SMA_cl_1W <- simple_moving_average(NVDA$NVDA.Close, 5)
NVDA$NVDA.SMA_cl_2W <- simple_moving_average(NVDA$NVDA.Close, 10)
NVDA$NVDA.SMA_cl_1M <- simple_moving_average(NVDA$NVDA.Close, 20)
NVDA$NVDA.SMA_cl_2M <- simple_moving_average(NVDA$NVDA.Close, 40)

INTC$INTC.SMA_cl_1W <- simple_moving_average(INTC$INTC.Close, 5)
INTC$INTC.SMA_cl_2W <- simple_moving_average(INTC$INTC.Close, 10)
INTC$INTC.SMA_cl_1M <- simple_moving_average(INTC$INTC.Close, 20)
INTC$INTC.SMA_cl_2M <- simple_moving_average(INTC$INTC.Close, 40)

TSM$TSM.SMA_cl_1W <- simple_moving_average(TSM$TSM.Close, 5)
TSM$TSM.SMA_cl_2W <- simple_moving_average(TSM$TSM.Close, 10)
TSM$TSM.SMA_cl_1M <- simple_moving_average(TSM$TSM.Close, 20)
TSM$TSM.SMA_cl_2M <- simple_moving_average(TSM$TSM.Close, 40)


ggplot(data = AMD, aes(x=AMD.Date)) +
  geom_line(aes(y = AMD.SMA_cl_1W, color = '1-Week')) + 
  geom_line(aes(y = AMD.SMA_cl_2W, color = '2-Week')) +
  geom_line(aes(y = AMD.SMA_cl_1M, color = '1-Month')) +
  geom_line(aes(y = AMD.SMA_cl_2M, color = '2-Month')) +
  ylab('USD') +
  scale_color_manual(values = c(
    '1-Week' = 'firebrick1',
    '2-Week' = 'chocolate1',
    '1-Month' = 'gold',
    '2-Month' = 'chartreuse'
  )) +
  xlab('Date') +
  ggtitle("(AMD) Simple Moving Average") +
  scale_y_continuous( labels = label_comma()) +
  theme_dark()

ggplot(data = NVDA, aes(x=NVDA.Date)) +
  geom_line(aes(y = NVDA.SMA_cl_1W, color = '1-Week')) + 
  geom_line(aes(y = NVDA.SMA_cl_2W, color = '2-Week')) +
  geom_line(aes(y = NVDA.SMA_cl_1M, color = '1-Month')) +
  geom_line(aes(y = NVDA.SMA_cl_2M, color = '2-Month')) +
  ylab('USD') +
  scale_color_manual(values = c(
    '1-Week' = 'firebrick1',
    '2-Week' = 'chocolate1',
    '1-Month' = 'gold',
    '2-Month' = 'chartreuse'
  )) +
  xlab('Date') +
  ggtitle("(NVDA) Simple Moving Average") +
  scale_y_continuous(  labels = label_comma()) +
  theme_dark()

ggplot(data = INTC, aes(x=INTC.Date)) +
  geom_line(aes(y = INTC.SMA_cl_1W, color = '1-Week')) + 
  geom_line(aes(y = INTC.SMA_cl_2W, color = '2-Week')) +
  geom_line(aes(y = INTC.SMA_cl_1M, color = '1-Month')) +
  geom_line(aes(y = INTC.SMA_cl_2M, color = '2-Month')) +
  ylab('USD') +
  scale_color_manual(values = c(
    '1-Week' = 'firebrick1',
    '2-Week' = 'chocolate1',
    '1-Month' = 'gold',
    '2-Month' = 'chartreuse'
  )) +
  xlab('Date') +
  ggtitle("(INTC) Simple Moving Average") +
  scale_y_continuous( labels = label_comma()) +
  theme_dark()

ggplot(data = TSM, aes(x=TSM.Date)) +
  geom_line(aes(y = TSM.SMA_cl_1W, color = '1-Week')) + 
  geom_line(aes(y = TSM.SMA_cl_2W, color = '2-Week')) +
  geom_line(aes(y = TSM.SMA_cl_1M, color = '1-Month')) +
  geom_line(aes(y = TSM.SMA_cl_2M, color = '2-Month')) +
  ylab('USD') +
  scale_color_manual(values = c(
    '1-Week' = 'firebrick1',
    '2-Week' = 'chocolate1',
    '1-Month' = 'gold',
    '2-Month' = 'chartreuse'
  )) +
  xlab('Date') +
  ggtitle("(TSM) Simple Moving Average") +
  scale_y_continuous(labels = label_comma()) +
  theme_dark()

One characteristic that immediately becomes apparent is that evaluating the running averages instead of the closing costs seems to “smooth out” the curves — in other words, the running average is much more stable and is not affected by a share’s volatility as much as our original predictors obtained from the CSV. In fact, what we are actually doing is slowly interpolating the data with the overall average; since the overall average is a constant function (and thus linear), the “smoothing out” process is simply a result of interpolating with a \(C^\infty(\mathbb{R})\) (smooth) function.

Exponential Moving Average

exponential_moving_average_helper <- function(my_vec, lag_period, smoothing_factor) {
  #' Helper function to evaluate the exponential moving average over a fixed period 
  #' using an array buffer
  #'
  #' Creates a new column vector whose entries are the average of the previous lag_period entries.
  #' When not enough data is available to take the average over lag_period, the closest possible 
  #' average will be taken (for example, if lag_period = 10, then the first 2nd entry of the output
  #' vector will simply be the average of the first two values, the 3rd entry of the output vector
  #' will be the average of the first three values, and so forth.)
  #'
  #' @param my_vec the column vector to take the average values of
  #' @param lag_period the number of days one wishes to average over
  #' 
  #' @return A vector whose entries represent the average of the previous lag_period entries in my_vec
  
  
  # Error handling
  if(is.vector(my_vec) == FALSE){
    stop("Not Vector: First argument of simple_moving_average must be a vector")
  }
  if(is.numeric(my_vec[1]) == FALSE){
    stop("Non-numeric Entries: values of vector in first argument must be numeric.")
  }
  if(is.numeric(lag_period) == FALSE || lag_period != round(lag_period)){
    stop("Not Integer: Second argument of simple_moving_average must be an integer larger than or equal to 2")
  }
  if(lag_period <= 1){
    stop("Not Large Enough: Second argument of simple_moving_average must be an integer larger than or equal to 2")
  }
 
  vec_len <- length(my_vec)
  output_vec <- c()
  output_vec[1] <- my_vec[1]
  
  for (i in 2:vec_len) {
    output_vec[i] <- smoothing_factor * my_vec[i] + (1 - smoothing_factor) * output_vec[i-1]
  }
  
  return(output_vec[vec_len])
}

exponential_moving_average <- function(my_vec, lag_period) {
  #' Takes the running average of a column vector
  #'
  #' Creates a new column vector whose entries are the average of the previous lag_period entries.
  #' When not enough data is available to take the average over lag_period, the closest possible 
  #' average will be taken (for example, if lag_period = 10, then the first 2nd entry of the output
  #' vector will simply be the average of the first two values, the 3rd entry of the output vector
  #' will be the average of the first three values, and so forth.)
  #'
  #' @param my_vec the column vector to take the average values of
  #' @param lag_period the number of days one wishes to average over
  #' 
  #' @return A vector whose entries represent the average of the previous lag_period entries in my_vec
  
  
  # Error handling
  if(is.vector(my_vec) == FALSE){
    stop("Not Vector: First argument of simple_moving_average must be a vector")
  }
  if(is.numeric(my_vec[1]) == FALSE){
    stop("Non-numeric Entries: values of vector in first argument must be numeric.")
  }
  if(is.numeric(lag_period) == FALSE || lag_period != round(lag_period)){
    stop("Not Integer: Second argument of simple_moving_average must be an integer larger than or equal to 2")
  }
  if(lag_period <= 1){
    stop("Not Large Enough: Second argument of simple_moving_average must be an integer larger than or equal to 2")
  }
 
  output_vec <- c()
  output_vec[1] <- my_vec[1]
  smoothing_factor = 2/(lag_period+1)
  for (i in 2:length(my_vec)) {
    # If there are less that lag_period of data previous to the current date,
    # simply take the average of all the days prior to get the closest thing
    # to a running average
    if (i <= lag_period){
      output_vec[i] = exponential_moving_average_helper(my_vec[1:i], i, 2/(i + 1))
    }
    else {
      output_vec[i] = exponential_moving_average_helper(my_vec[(i-lag_period + 1):i], lag_period, smoothing_factor)
    }
  }
  
  return(output_vec)
}


AMD$AMD.EMA_cl_1W <- exponential_moving_average(AMD$AMD.Close, 5)
AMD$AMD.EMA_cl_2W <- exponential_moving_average(AMD$AMD.Close, 10)
AMD$AMD.EMA_cl_1M <- exponential_moving_average(AMD$AMD.Close, 20)
AMD$AMD.EMA_cl_2M <- exponential_moving_average(AMD$AMD.Close, 40)

NVDA$NVDA.EMA_cl_1W <- exponential_moving_average(NVDA$NVDA.Close, 5)
NVDA$NVDA.EMA_cl_2W <- exponential_moving_average(NVDA$NVDA.Close, 10)
NVDA$NVDA.EMA_cl_1M <- exponential_moving_average(NVDA$NVDA.Close, 20)
NVDA$NVDA.EMA_cl_2M <- exponential_moving_average(NVDA$NVDA.Close, 40)

INTC$INTC.EMA_cl_1W <- exponential_moving_average(INTC$INTC.Close, 5)
INTC$INTC.EMA_cl_2W <- exponential_moving_average(INTC$INTC.Close, 10)
INTC$INTC.EMA_cl_1M <- exponential_moving_average(INTC$INTC.Close, 20)
INTC$INTC.EMA_cl_2M <- exponential_moving_average(INTC$INTC.Close, 40)

TSM$TSM.EMA_cl_1W <- exponential_moving_average(TSM$TSM.Close, 5)
TSM$TSM.EMA_cl_2W <- exponential_moving_average(TSM$TSM.Close, 10)
TSM$TSM.EMA_cl_1M <- exponential_moving_average(TSM$TSM.Close, 20)
TSM$TSM.EMA_cl_2M <- exponential_moving_average(TSM$TSM.Close, 40)

ggplot(data = AMD, aes(x=AMD.Date)) +
  geom_line(aes(y = AMD.EMA_cl_1W, color = '1-Week')) + 
  geom_line(aes(y = AMD.EMA_cl_2W, color = '2-Week')) +
  geom_line(aes(y = AMD.EMA_cl_1M, color = '1-Month')) +
  geom_line(aes(y = AMD.EMA_cl_2M, color = '2-Month')) +
  ylab('USD') +
  scale_color_manual(values = c(
    '1-Week' = 'firebrick1',
    '2-Week' = 'chocolate1',
    '1-Month' = 'gold',
    '2-Month' = 'chartreuse'
  )) +
  xlab('Date') +
  ggtitle("(AMD) Exponential Moving Average") +
  scale_y_continuous( labels = label_comma()) +
  theme_dark()

ggplot(data = NVDA, aes(x=NVDA.Date)) +
  geom_line(aes(y = NVDA.EMA_cl_1W, color = '1-Week')) + 
  geom_line(aes(y = NVDA.EMA_cl_2W, color = '2-Week')) +
  geom_line(aes(y = NVDA.EMA_cl_1M, color = '1-Month')) +
  geom_line(aes(y = NVDA.EMA_cl_2M, color = '2-Month')) +
  ylab('USD') +
  scale_color_manual(values = c(
    '1-Week' = 'firebrick1',
    '2-Week' = 'chocolate1',
    '1-Month' = 'gold',
    '2-Month' = 'chartreuse'
  )) +
  xlab('Date') +
  ggtitle("(NVDA) Exponential Moving Average") +
  scale_y_continuous( labels = label_comma()) +
  theme_dark()

ggplot(data = INTC, aes(x=INTC.Date)) +
  geom_line(aes(y = INTC.EMA_cl_1W, color = '1-Week')) + 
  geom_line(aes(y = INTC.EMA_cl_2W, color = '2-Week')) +
  geom_line(aes(y = INTC.EMA_cl_1M, color = '1-Month')) +
  geom_line(aes(y = INTC.EMA_cl_2M, color = '2-Month')) +
  ylab('USD') +
  scale_color_manual(values = c(
    '1-Week' = 'firebrick1',
    '2-Week' = 'chocolate1',
    '1-Month' = 'gold',
    '2-Month' = 'chartreuse'
  )) +
  xlab('Date') +
  ggtitle("(INTC) Exponential Moving Average") +
  scale_y_continuous( labels = label_comma()) +
  theme_dark()

ggplot(data = TSM, aes(x=TSM.Date)) +
  geom_line(aes(y = TSM.EMA_cl_1W, color = '1-Week')) + 
  geom_line(aes(y = TSM.EMA_cl_2W, color = '2-Week')) +
  geom_line(aes(y = TSM.EMA_cl_1M, color = '1-Month')) +
  geom_line(aes(y = TSM.EMA_cl_2M, color = '2-Month')) +
  ylab('USD') +
  scale_color_manual(values = c(
    '1-Week' = 'firebrick1',
    '2-Week' = 'chocolate1',
    '1-Month' = 'gold',
    '2-Month' = 'chartreuse'
  )) +
  xlab('Date') +
  ggtitle("(TSM) Exponential Moving Average") +
  scale_y_continuous( labels = label_comma()) +
  theme_dark()

Simple Moving Deviation of Closing Price

With a concrete notion of the simple moving average closing price of a stock, it is natural to measure the standard deviation as well to gain an accurate insight on the volatility of each stock.

simple_moving_deviation <- function(my_vec, lag_period) {
  #' Takes the running standard deviation of a column vector
  #'
  #' Creates a new column vector whose entries are the standard deviation of the previous lag_period entries.
  #' When not enough data is available to take the deviation over lag_period, the closest possible 
  #' average will be taken (for example, if lag_period = 10, then the first 2nd entry of the output
  #' vector will simply be the average of the first two values, the 3rd entry of the output vector
  #' will be the average of the first three values, and so forth.)
  #'
  #' @param my_vec the column vector to take the standard deviation of
  #' @param lag_period the number of days one wishes to average over
  #' 
  #' @return A vector whose entries represent the standard deviation of the previous lag_period entries in my_vec
  
  
  # Error handling
  if(is.vector(my_vec) == FALSE){
    stop("Not Vector: First argument of simple_moving_average must be a vector")
  }
  if(is.numeric(my_vec[1]) == FALSE){
    stop("Non-numeric Entries: values of vector in first argument must be numeric.")
  }
  if(is.numeric(lag_period) == FALSE || lag_period != round(lag_period)){
    stop("Not Integer: Second argument of simple_moving_average must be an integer larger than or equal to 2")
  }
  if(lag_period <= 1){
    stop("Not Large Enough: Second argument of simple_moving_average must be an integer larger than or equal to 2")
  }

  # return variable
  output_vec = c()
  
  # Setting the first standard deviation to 0 and beginning the loop
  # at 2 prevents a divide by 0 error without adding an additional if-else branch
  # in the loop
  output_vec[1] = 0
  for (i in 2:length(my_vec)) {
    
    # If there are less that lag_period of data previous to the current date,
    # simply take the average of all the days prior to get the closest thing
    # to a running average
    if (i <= lag_period){
       output_vec[i] = sd(my_vec[1:i])
    }
    else {
      output_vec[i] = sd(my_vec[(i-lag_period+1):i])
    }
  }
  return(output_vec)
}


AMD$AMD.SMD_cl_1W <- simple_moving_deviation(AMD$AMD.Close, 5)
AMD$AMD.SMD_cl_2W <- simple_moving_deviation(AMD$AMD.Close, 10)
AMD$AMD.SMD_cl_1M <- simple_moving_deviation(AMD$AMD.Close, 20)
AMD$AMD.SMD_cl_2M <- simple_moving_deviation(AMD$AMD.Close, 40)

NVDA$NVDA.SMD_cl_1W <- simple_moving_deviation(NVDA$NVDA.Close, 5)
NVDA$NVDA.SMD_cl_2W <- simple_moving_deviation(NVDA$NVDA.Close, 10)
NVDA$NVDA.SMD_cl_1M <- simple_moving_deviation(NVDA$NVDA.Close, 20)
NVDA$NVDA.SMD_cl_2M <- simple_moving_deviation(NVDA$NVDA.Close, 40)

INTC$INTC.SMD_cl_1W <- simple_moving_deviation(INTC$INTC.Close, 5)
INTC$INTC.SMD_cl_2W <- simple_moving_deviation(INTC$INTC.Close, 10)
INTC$INTC.SMD_cl_1M <- simple_moving_deviation(INTC$INTC.Close, 20)
INTC$INTC.SMD_cl_2M <- simple_moving_deviation(INTC$INTC.Close, 40)

TSM$TSM.SMD_cl_1W <- simple_moving_deviation(TSM$TSM.Close, 5)
TSM$TSM.SMD_cl_2W <- simple_moving_deviation(TSM$TSM.Close, 10)
TSM$TSM.SMD_cl_1M <- simple_moving_deviation(TSM$TSM.Close, 20)
TSM$TSM.SMD_cl_2M <- simple_moving_deviation(TSM$TSM.Close, 40)

ggplot(data = AMD, aes(x=AMD.Date)) +
  geom_line(aes(y = AMD.SMD_cl_1W, color = '1-Week')) + 
  geom_line(aes(y = AMD.SMD_cl_2W, color = '2-Week')) +
  geom_line(aes(y = AMD.SMD_cl_1M, color = '1-Month')) +
  geom_line(aes(y = AMD.SMD_cl_2M, color = '2-Month')) +
  ylab('USD') +
  scale_color_manual(values = c(
    '1-Week' = 'firebrick1',
    '2-Week' = 'chocolate1',
    '1-Month' = 'gold',
    '2-Month' = 'chartreuse'
  )) +
  xlab('Date') +
  ggtitle("(AMD) Simple Moving Deviation") +
  scale_y_continuous( labels = label_comma()) +
  theme_dark()

ggplot(data = NVDA, aes(x=NVDA.Date)) +
  geom_line(aes(y = NVDA.SMD_cl_1W, color = '1-Week')) + 
  geom_line(aes(y = NVDA.SMD_cl_2W, color = '2-Week')) +
  geom_line(aes(y = NVDA.SMD_cl_1M, color = '1-Month')) +
  geom_line(aes(y = NVDA.SMD_cl_2M, color = '2-Month')) +
  ylab('USD') +
  scale_color_manual(values = c(
    '1-Week' = 'firebrick1',
    '2-Week' = 'chocolate1',
    '1-Month' = 'gold',
    '2-Month' = 'chartreuse'
  )) +
  xlab('Date') +
  ggtitle("(NVDA) Simple Moving Deviation") +
  scale_y_continuous( labels = label_comma()) +
  theme_dark()

ggplot(data = INTC, aes(x=INTC.Date)) +
  geom_line(aes(y = INTC.SMD_cl_1W, color = '1-Week')) + 
  geom_line(aes(y = INTC.SMD_cl_2W, color = '2-Week')) +
  geom_line(aes(y = INTC.SMD_cl_1M, color = '1-Month')) +
  geom_line(aes(y = INTC.SMD_cl_2M, color = '2-Month')) +
  ylab('USD') +
  scale_color_manual(values = c(
    '1-Week' = 'firebrick1',
    '2-Week' = 'chocolate1',
    '1-Month' = 'gold',
    '2-Month' = 'chartreuse'
  )) +
  xlab('Date') +
  ggtitle("(INTC) Simple Moving Deviation") +
  scale_y_continuous(  labels = label_comma()) +
  theme_dark()

ggplot(data = TSM, aes(x=TSM.Date)) +
  geom_line(aes(y = TSM.SMD_cl_1W, color = '1-Week')) + 
  geom_line(aes(y = TSM.SMD_cl_2W, color = '2-Week')) +
  geom_line(aes(y = TSM.SMD_cl_1M, color = '1-Month')) +
  geom_line(aes(y = TSM.SMD_cl_2M, color = '2-Month')) +
  ylab('USD') +
  scale_color_manual(values = c(
    '1-Week' = 'firebrick1',
    '2-Week' = 'chocolate1',
    '1-Month' = 'gold',
    '2-Month' = 'chartreuse'
  )) +
  xlab('Date') +
  ggtitle("(TSM) Simple Moving Deviation") +
  scale_y_continuous( labels = label_comma()) +
  theme_dark()

Moving Average Convergence Divergence (MACD)

AMD$AMD.MACD = (AMD$AMD.EMA_cl_2W - AMD$AMD.EMA_cl_1M)
NVDA$NVDA.MACD = (NVDA$NVDA.EMA_cl_2W - NVDA$NVDA.EMA_cl_1M)
INTC$INTC.MACD = (INTC$INTC.EMA_cl_2W - INTC$INTC.EMA_cl_1M)
TSM$TSM.MACD = (TSM$TSM.EMA_cl_2W - TSM$TSM.EMA_cl_1M)

ggplot(data = AMD, aes(x=AMD.Date)) +
  geom_line(aes(y = AMD.MACD, color = 'MACD')) +
  ylab('USD') +
  scale_color_manual(values = c(
    'MACD' = 'aquamarine'
  )) +
  xlab('Date') +
  ggtitle("AMD MACD Line") +
  scale_y_continuous(  labels = label_comma()) +
  theme_dark()

ggplot(data = NVDA, aes(x=NVDA.Date)) +
  geom_line(aes(y = NVDA.MACD, color = 'MACD')) +
  ylab('USD') +
  scale_color_manual(values = c(
    'MACD' = 'aquamarine'
  )) +
  xlab('Date') +
  ggtitle("NVDA MACD Line") +
  scale_y_continuous( labels = label_comma()) +
  theme_dark()

ggplot(data = INTC, aes(x=INTC.Date)) +
  geom_line(aes(y = INTC.MACD, color = 'MACD')) +
  ylab('USD') +
  scale_color_manual(values = c(
    'MACD' = 'aquamarine'
  )) +
  xlab('Date') +
  ggtitle("INTC MACD Line") +
  scale_y_continuous( labels = label_comma()) +
  theme_dark()

ggplot(data = TSM, aes(x=TSM.Date)) +
  geom_line(aes(y = TSM.MACD, color = 'MACD')) +
  ylab('USD') +
  scale_color_manual(values = c(
    'MACD' = 'aquamarine'
  )) +
  xlab('Date') +
  ggtitle("TSM MACD Line") +
  scale_y_continuous(  labels = label_comma()) +
  theme_dark()

Data Correlation

While having a large array of predictors is in some sense useful for seeing the whole picture of the semiconductor market for the 2023-2024 fiscal year, there is also a potentially significant amount of unnecessary information. As mentioned prior, the behavior of many of our initial predictors coming from the CSV files are very closely related to one another — the closing price one day is directly tied to the opening price of the following day, and if a stock’s minimum / Low value is increasing that generally means all 4 other predictors (aside from volume) are increasing as well. In addition, comparing the performance between two stocks is generally going to be heavily correlated due to the fact that they both follow the underlying market’s climate.

Ultimately, in order to achieve a good understanding of the correlations between all of our predictors we will need to cross examine several subsets of our predictors to see which predictors are correlated for a single stock, and which predictors are useful for measuring competition between stocks. Dividing our correlation plots into two types, we first examine how the predictors are correlated for a fixed stock, and test this underlying trend accross a subset of our stocks (ASML, INTC, NVDA, and NXPI ):

select(AMD, is.numeric) %>%
  cor() %>%
  corrplot(method = "circle", type = "lower", diag = FALSE, tl.cex=0.6, title="Correlation Plot")

select(NVDA, NVDA.Close, NVDA.Volume, NVDA.SMD_cl_1W, NVDA.SMD_cl_2W, NVDA.SMD_cl_1M, NVDA.SMD_cl_2M, NVDA.MACD) %>%
  cor() %>%
  corrplot(method = "circle", type = "lower", diag = FALSE, tl.cex=0.6, title="NVDA Correlation Plot" )
select(INTC, INTC.Close, INTC.Volume, INTC.SMD_cl_1W, INTC.SMD_cl_2W, INTC.SMD_cl_1M, INTC.SMD_cl_2M, INTC.MACD) %>%
  cor() %>%
  corrplot(method = "circle", type = "lower", diag = FALSE, tl.cex=0.6, title="INTC Correlation Plot")
select(TSM, TSM.Close, TSM.Volume, TSM.SMD_cl_1W, TSM.SMD_cl_2W, TSM.SMD_cl_1M, TSM.SMD_cl_2M, TSM.MACD) %>%
  cor() %>%
  corrplot(method = "circle", type = "lower", diag = FALSE, tl.cex=0.6, title="TSM Correlation Plot")

Setting Up Models

With a better picture in mind of how our stock prices can be measured from both the given metrics and how they interact with one another, we can now set up our data and begin training our models. This will be done in several steps, first preparing the data to ensure that our models do not become over-fitted to a particular data-set.

Data Split

One of the primary ways we ensure robustness of our models is by partitioning our data into training and testing data. Foremost, this ensures that our model does not become overfit to the details and noise of our underlying data-set by introducing a portion of the data which is unseen during the training phase (i.e. the testing data). Ultimately, one would want outcome variable to have similar statistics / variance across both the training and testing sets — this is accomplished by stratifying our split about the desired outcome variable.

AMD_split <- initial_split(AMD, prop = 0.7,
                                strata = AMD.Close)
AMD_train <- training(AMD_split)
AMD_test <- testing(AMD_split)
NVDA_split <- initial_split(NVDA, prop = 0.7,
                                strata = NVDA.Close)
NVDA_train <- training(NVDA_split)
NVDA_test <- testing(NVDA_split)
INTC_split <- initial_split(INTC, prop = 0.7,
                                strata = INTC.Close)
INTC_train <- training(INTC_split)
INTC_test <- testing(INTC_split)
TSM_split <- initial_split(TSM, prop = 0.7,
                                strata = TSM.Close)
TSM_train <- training(TSM_split)
TSM_test <- testing(TSM_split)

Model Fitting

AMD_recipe = recipe(AMD.Close ~ AMD.Close_L1 + AMD.Close_L2 + AMD.Close_L3 + AMD.Close_L4 +
                      AMD.Close_L5 + AMD.Close_L6 + AMD.Close_L7 + AMD.Close_L8 + AMD.Close_L9 +
                      AMD.Close_L10 + AMD.SMD_cl_1W + AMD.SMA_cl_1W + AMD.MACD,
                    data=AMD_train) %>%
  step_interact(terms= ~ AMD.Close_L1:AMD.Close_L2) %>%
  step_interact(terms= ~ AMD.Close_L2:AMD.Close_L3) %>%
  step_interact(terms= ~ AMD.Close_L3:AMD.Close_L4) %>%
  step_interact(terms= ~ AMD.Close_L4:AMD.Close_L5) %>%
  step_interact(terms= ~ AMD.Close_L5:AMD.Close_L6) %>%
  step_interact(terms= ~ AMD.Close_L6:AMD.Close_L7) %>%
  step_interact(terms= ~ AMD.Close_L7:AMD.Close_L8) %>%
  step_interact(terms= ~ AMD.Close_L8:AMD.Close_L9) %>%
  step_interact(terms= ~ AMD.Close_L9:AMD.Close_L10) %>%
  step_center(all_predictors()) %>%
  step_scale(all_predictors())
NVDA_recipe = recipe(NVDA.Close ~ NVDA.Close_L1 + NVDA.Close_L2 + NVDA.Close_L3 + NVDA.Close_L4 + NVDA.Close_L5 + 
                       NVDA.Close_L6 + NVDA.Close_L7 + NVDA.Close_L8 + NVDA.Close_L9 + NVDA.Close_L10 +  
                      NVDA.SMD_cl_1W + NVDA.SMA_cl_1W  + NVDA.MACD,
                    data=NVDA_train) %>%
  step_interact(terms= ~ NVDA.Close_L1:NVDA.Close_L2) %>%
  step_interact(terms= ~ NVDA.Close_L2:NVDA.Close_L3) %>%
  step_interact(terms= ~ NVDA.Close_L3:NVDA.Close_L4) %>%
  step_interact(terms= ~ NVDA.Close_L4:NVDA.Close_L5) %>%
  step_interact(terms= ~ NVDA.Close_L5:NVDA.Close_L6) %>%
  step_interact(terms= ~ NVDA.Close_L6:NVDA.Close_L7) %>%
  step_interact(terms= ~ NVDA.Close_L7:NVDA.Close_L8) %>%
  step_interact(terms= ~ NVDA.Close_L8:NVDA.Close_L9) %>%
  step_interact(terms= ~ NVDA.Close_L9:NVDA.Close_L10) %>%
  step_center(all_predictors()) %>%
  step_scale(all_predictors())
INTC_recipe = recipe(INTC.Close ~ INTC.Close_L1 + INTC.Close_L2 + INTC.Close_L3 + INTC.Close_L4 +
                      INTC.Close_L5 + INTC.Close_L6 + INTC.Close_L7 + INTC.Close_L8 + INTC.Close_L9 + INTC.Close_L10  +
                      INTC.SMD_cl_1W + INTC.SMA_cl_1W + INTC.EMA_cl_1W + INTC.MACD,
                    data=INTC_train) %>%
  step_interact(terms= ~ INTC.Close_L1:INTC.Close_L2) %>%
  step_interact(terms= ~ INTC.Close_L2:INTC.Close_L3) %>%
  step_interact(terms= ~ INTC.Close_L3:INTC.Close_L4) %>%
  step_interact(terms= ~ INTC.Close_L4:INTC.Close_L5) %>%
  step_interact(terms= ~ INTC.Close_L5:INTC.Close_L6) %>%
  step_interact(terms= ~ INTC.Close_L6:INTC.Close_L7) %>%
  step_interact(terms= ~ INTC.Close_L7:INTC.Close_L8) %>%
  step_interact(terms= ~ INTC.Close_L8:INTC.Close_L9) %>%
  step_interact(terms= ~ INTC.Close_L9:INTC.Close_L10) %>%
  step_center(all_predictors()) %>%
  step_scale(all_predictors())
TSM_recipe = recipe(TSM.Close ~ TSM.Close_L1 + TSM.Close_L2 + TSM.Close_L3 + TSM.Close_L4 +
                      TSM.Close_L5 + TSM.Close_L6 + TSM.Close_L7 + TSM.Close_L8 + TSM.Close_L9 + TSM.Close_L10 +
                      TSM.SMD_cl_1W + TSM.SMA_cl_1W + TSM.EMA_cl_1W + TSM.MACD,
                    data=TSM_train) %>%
  step_interact(terms= ~ TSM.Close_L1:TSM.Close_L2) %>%
  step_interact(terms= ~ TSM.Close_L2:TSM.Close_L3) %>%
  step_interact(terms= ~ TSM.Close_L3:TSM.Close_L4) %>%
  step_interact(terms= ~ TSM.Close_L4:TSM.Close_L5) %>%
  step_interact(terms= ~ TSM.Close_L5:TSM.Close_L6) %>%
  step_interact(terms= ~ TSM.Close_L6:TSM.Close_L7) %>%
  step_interact(terms= ~ TSM.Close_L7:TSM.Close_L8) %>%
  step_interact(terms= ~ TSM.Close_L8:TSM.Close_L9) %>%
  step_interact(terms= ~ TSM.Close_L9:TSM.Close_L10) %>%
  step_center(all_predictors()) %>%
  step_scale(all_predictors())

k-Fold Cross Validation

AMD_folds  <- vfold_cv(AMD_train, v = 10, strata = AMD.Close)
NVDA_folds  <- vfold_cv(NVDA_train, v = 10, strata = NVDA.Close)
INTC_folds  <- vfold_cv(INTC_train, v = 10, strata = INTC.Close)
TSM_folds  <- vfold_cv(TSM_train, v = 10, strata = TSM.Close)

Fitting the Models

# Linear Regression
lm_model <- linear_reg() %>%
  set_engine("lm")


# Ridge Regression
ridge_model <- linear_reg(mixture = 0,
                         penalty = tune()) %>%
  set_mode("regression") %>%
  set_engine("glmnet")

# Lasso Regression
lasso_model <- linear_reg(mixture = 1,
                         penalty = tune()) %>%
  set_mode("regression") %>%
  set_engine("glmnet")


# Elastic Net
elastic_net_model <- linear_reg(mixture = tune(),
                              penalty = tune()) %>%
  set_mode("regression") %>%
  set_engine("glmnet")

# k-Nearest Neighbors
knn_model <- nearest_neighbor(neighbors = tune()) %>%
  set_engine("kknn") %>%
  set_mode("regression")

Set Up Workflows

# Linear Regression Workflows
lm_wflow_AMD <- workflow() %>%
  add_model(lm_model) %>%
  add_recipe(AMD_recipe)
 
# Ridge Regression Workflows
ridge_wflow_AMD <- workflow() %>%
  add_model(ridge_model) %>%
  add_recipe(AMD_recipe)
 
# Lasso Regression Workflows
lasso_wflow_AMD <- workflow() %>%
  add_model(lasso_model) %>%
  add_recipe(AMD_recipe)
 
# Elastic Net Workflows
elastic_net_wflow_AMD <- workflow() %>%
  add_model(elastic_net_model) %>%
  add_recipe(AMD_recipe)
  
# k-Nearest Neighbors Workflows
knn_wflow_AMD <- workflow() %>%
  add_model(knn_model) %>%
  add_recipe(AMD_recipe)
# Linear Regression Workflows
lm_wflow_NVDA <- workflow() %>%
  add_model(lm_model) %>%
  add_recipe(NVDA_recipe)
 
# Ridge Regression Workflows
ridge_wflow_NVDA <- workflow() %>%
  add_model(ridge_model) %>%
  add_recipe(NVDA_recipe)
 
# Lasso Regression Workflows
lasso_wflow_NVDA <- workflow() %>%
  add_model(lasso_model) %>%
  add_recipe(NVDA_recipe)
 
# Elastic Net Workflows
elastic_net_wflow_NVDA <- workflow() %>%
  add_model(elastic_net_model) %>%
  add_recipe(NVDA_recipe)
  
# k-Nearest Neighbors Workflows
knn_wflow_NVDA <- workflow() %>%
  add_model(knn_model) %>%
  add_recipe(NVDA_recipe)
# Linear Regression Workflows
lm_wflow_INTC <- workflow() %>%
  add_model(lm_model) %>%
  add_recipe(INTC_recipe)
 
# Ridge Regression Workflows
ridge_wflow_INTC <- workflow() %>%
  add_model(ridge_model) %>%
  add_recipe(INTC_recipe)
 
# Lasso Regression Workflows
lasso_wflow_INTC <- workflow() %>%
  add_model(lasso_model) %>%
  add_recipe(INTC_recipe)
 
# Elastic Net Workflows
elastic_net_wflow_INTC <- workflow() %>%
  add_model(elastic_net_model) %>%
  add_recipe(INTC_recipe)
  
# k-Nearest Neighbors Workflows
knn_wflow_INTC <- workflow() %>%
  add_model(knn_model) %>%
  add_recipe(INTC_recipe)
# Linear Regression Workflows
lm_wflow_TSM <- workflow() %>%
  add_model(lm_model) %>%
  add_recipe(TSM_recipe)
 
# Ridge Regression Workflows
ridge_wflow_TSM <- workflow() %>%
  add_model(ridge_model) %>%
  add_recipe(TSM_recipe)
 
# Lasso Regression Workflows
lasso_wflow_TSM <- workflow() %>%
  add_model(lasso_model) %>%
  add_recipe(TSM_recipe)
 
# Elastic Net Workflows
elastic_net_wflow_TSM <- workflow() %>%
  add_model(elastic_net_model) %>%
  add_recipe(TSM_recipe)
  
# k-Nearest Neighbors Workflows
knn_wflow_TSM <- workflow() %>%
  add_model(knn_model) %>%
  add_recipe(TSM_recipe)

Hyperparameter Tuning

Set up Grids:

# Grid for Ridge Regression and Lasso Regression
no_mixture_grid <- grid_regular(penalty(range = c(0,1)), levels = 50)

# Grid for Elastic Net
elastic_net_grid <- grid_regular(penalty(range = c(0, 1),
                                     trans = identity_trans()),
                        mixture(range = c(0, 1)),
                             levels = 10)

# k-Nearest Neighbors Net
knn_grid <- grid_regular(neighbors(range = c(2,20)), levels = 10)
# Find optimal parameters for ridge regression
ridge_tune_AMD <- tune_grid(
  ridge_wflow_AMD,
  resamples = AMD_folds,
  grid = no_mixture_grid
)
ridge_final_wflow_AMD  <- select_best(ridge_tune_AMD , metric="rmse" ) %>%
  finalize_workflow(x=ridge_wflow_AMD )

# Find optimal parameters for lasso regression
lasso_tune_AMD  <- tune_grid(
  lasso_wflow_AMD ,
  resamples = AMD_folds,
  grid = no_mixture_grid
)
lasso_final_wflow_AMD  <- select_best(lasso_tune_AMD , metric="rmse") %>%
  finalize_workflow(x=lasso_wflow_AMD )

# Find optimal parameters for Elastic Net
elastic_net_tune_AMD  <- tune_grid(
  elastic_net_wflow_AMD ,
  resamples = AMD_folds,
  grid = elastic_net_grid
)
elastic_net_final_wflow_AMD  <- select_best(elastic_net_tune_AMD , metric = "rmse") %>%
  finalize_workflow(x=elastic_net_wflow_AMD )

# Find optimal parameters for k-Nearest Neighbors
knn_tune_AMD  <- tune_grid(
    knn_wflow_AMD ,
    resamples = AMD_folds,
    grid = knn_grid
)
knn_final_wflow_AMD  <- select_best(knn_tune_AMD , metric = "rmse") %>%
  finalize_workflow(x=knn_wflow_AMD )
# Find optimal parameters for ridge regression
ridge_tune_NVDA <- tune_grid(
  ridge_wflow_NVDA,
  resamples = NVDA_folds,
  grid = no_mixture_grid
)
ridge_final_wflow_NVDA  <- select_best(ridge_tune_NVDA , metric="rmse" ) %>%
  finalize_workflow(x=ridge_wflow_NVDA )

# Find optimal parameters for lasso regression
lasso_tune_NVDA  <- tune_grid(
  lasso_wflow_NVDA ,
  resamples = NVDA_folds,
  grid = no_mixture_grid
)
lasso_final_wflow_NVDA  <- select_best(lasso_tune_NVDA , metric="rmse") %>%
  finalize_workflow(x=lasso_wflow_NVDA )

# Find optimal parameters for Elastic Net
elastic_net_tune_NVDA  <- tune_grid(
  elastic_net_wflow_NVDA ,
  resamples = NVDA_folds,
  grid = elastic_net_grid
)
elastic_net_final_wflow_NVDA  <- select_best(elastic_net_tune_NVDA , metric = "rmse") %>%
  finalize_workflow(x=elastic_net_wflow_NVDA )

# Find optimal parameters for k-Nearest Neighbors
knn_tune_NVDA  <- tune_grid(
    knn_wflow_NVDA ,
    resamples = NVDA_folds,
    grid = knn_grid
)
knn_final_wflow_NVDA  <- select_best(knn_tune_NVDA , metric = "rmse") %>%
  finalize_workflow(x=knn_wflow_NVDA )
# Find optimal parameters for ridge regression
ridge_tune_INTC <- tune_grid(
  ridge_wflow_INTC,
  resamples = INTC_folds,
  grid = no_mixture_grid
)
ridge_final_wflow_INTC  <- select_best(ridge_tune_INTC , metric="rmse" ) %>%
  finalize_workflow(x=ridge_wflow_INTC )

# Find optimal parameters for lasso regression
lasso_tune_INTC  <- tune_grid(
  lasso_wflow_INTC ,
  resamples = INTC_folds,
  grid = no_mixture_grid
)
## → A | warning: A correlation computation is required, but `estimate` is constant and has 0
##                standard deviation, resulting in a divide by 0 error. `NA` will be returned.
## There were issues with some computations   A: x1There were issues with some computations   A: x2There were issues with some computations   A: x3There were issues with some computations   A: x4There were issues with some computations   A: x5There were issues with some computations   A: x6There were issues with some computations   A: x7There were issues with some computations   A: x8There were issues with some computations   A: x9There were issues with some computations   A: x10There were issues with some computations   A: x10
lasso_final_wflow_INTC  <- select_best(lasso_tune_INTC , metric="rmse") %>%
  finalize_workflow(x=lasso_wflow_INTC )

# Find optimal parameters for Elastic Net
elastic_net_tune_INTC  <- tune_grid(
  elastic_net_wflow_INTC ,
  resamples = INTC_folds,
  grid = elastic_net_grid
)
elastic_net_final_wflow_INTC  <- select_best(elastic_net_tune_INTC , metric = "rmse") %>%
  finalize_workflow(x=elastic_net_wflow_INTC )

# Find optimal parameters for k-Nearest Neighbors
knn_tune_INTC  <- tune_grid(
    knn_wflow_INTC ,
    resamples = INTC_folds,
    grid = knn_grid
)
knn_final_wflow_INTC  <- select_best(knn_tune_INTC , metric = "rmse") %>%
  finalize_workflow(x=knn_wflow_INTC )
# Find optimal parameters for ridge regression
ridge_tune_TSM <- tune_grid(
  ridge_wflow_TSM,
  resamples = TSM_folds,
  grid = no_mixture_grid
)
ridge_final_wflow_TSM  <- select_best(ridge_tune_TSM , metric="rmse" ) %>%
  finalize_workflow(x=ridge_wflow_TSM )

# Find optimal parameters for lasso regression
lasso_tune_TSM  <- tune_grid(
  lasso_wflow_TSM ,
  resamples = TSM_folds,
  grid = no_mixture_grid
)
## → A | warning: A correlation computation is required, but `estimate` is constant and has 0
##                standard deviation, resulting in a divide by 0 error. `NA` will be returned.
## There were issues with some computations   A: x1There were issues with some computations   A: x2There were issues with some computations   A: x3There were issues with some computations   A: x4There were issues with some computations   A: x5There were issues with some computations   A: x6There were issues with some computations   A: x7There were issues with some computations   A: x8There were issues with some computations   A: x9There were issues with some computations   A: x10There were issues with some computations   A: x10
lasso_final_wflow_TSM  <- select_best(lasso_tune_TSM , metric="rmse") %>%
  finalize_workflow(x=lasso_wflow_TSM )

# Find optimal parameters for Elastic Net
elastic_net_tune_TSM  <- tune_grid(
  elastic_net_wflow_TSM ,
  resamples = TSM_folds,
  grid = elastic_net_grid
)
elastic_net_final_wflow_TSM  <- select_best(elastic_net_tune_TSM , metric = "rmse") %>%
  finalize_workflow(x=elastic_net_wflow_TSM )

# Find optimal parameters for k-Nearest Neighbors
knn_tune_TSM  <- tune_grid(
    knn_wflow_TSM ,
    resamples = TSM_folds,
    grid = knn_grid
)
knn_final_wflow_TSM  <- select_best(knn_tune_TSM , metric = "rmse") %>%
  finalize_workflow(x=knn_wflow_TSM )

Model Fitting

# Linear Regression Fits
lm_fit_AMD  <- fit(lm_wflow_AMD, AMD_train)
 
# Ridge Regression Fits
ridge_fit_AMD  <- fit(ridge_final_wflow_AMD , AMD_train)
 
# Lasso Regression Fits
lasso_fit_AMD  <- fit(lasso_final_wflow_AMD , AMD_train)

# Elastic Net Fits
elastic_net_fit_AMD <- fit(elastic_net_final_wflow_AMD , AMD_train)
 
# k-Nearest Neighbors Fit
knn_fit_AMD  <- fit(knn_final_wflow_AMD , AMD_train)
# Linear Regression Fits
lm_fit_NVDA  <- fit(lm_wflow_NVDA, NVDA_train)
 
# Ridge Regression Fits
ridge_fit_NVDA  <- fit(ridge_final_wflow_NVDA , NVDA_train)
 
# Lasso Regression Fits
lasso_fit_NVDA  <- fit(lasso_final_wflow_NVDA , NVDA_train)

# Elastic Net Fits
elastic_net_fit_NVDA <- fit(elastic_net_final_wflow_NVDA , NVDA_train)
 
# k-Nearest Neighbors Fit
knn_fit_NVDA  <- fit(knn_final_wflow_NVDA , NVDA_train)
# Linear Regression Fits
lm_fit_INTC  <- fit(lm_wflow_INTC, INTC_train)
 
# Ridge Regression Fits
ridge_fit_INTC  <- fit(ridge_final_wflow_INTC , INTC_train)
 
# Lasso Regression Fits
lasso_fit_INTC  <- fit(lasso_final_wflow_INTC , INTC_train)

# Elastic Net Fits
elastic_net_fit_INTC <- fit(elastic_net_final_wflow_INTC , INTC_train)
 
# k-Nearest Neighbors Fit
knn_fit_INTC  <- fit(knn_final_wflow_INTC , INTC_train)
# Linear Regression Fits
lm_fit_TSM  <- fit(lm_wflow_TSM, TSM_train)
 
# Ridge Regression Fits
ridge_fit_TSM  <- fit(ridge_final_wflow_TSM , TSM_train)
 
# Lasso Regression Fits
lasso_fit_TSM  <- fit(lasso_final_wflow_TSM , TSM_train)

# Elastic Net Fits
elastic_net_fit_TSM <- fit(elastic_net_final_wflow_TSM , TSM_train)
 
# k-Nearest Neighbors Fit
knn_fit_TSM  <- fit(knn_final_wflow_TSM , TSM_train)

Model Results

# Linear Regression Training
lm_train_res_AMD  <- predict(lm_fit_AMD , new_data = AMD_train %>% select(-AMD.Close))
lm_train_res_AMD  <- bind_cols(lm_train_res_AMD , AMD_train %>% select(AMD.Close))
 
# Ridge Regression Training
ridge_train_res_AMD  <- predict(ridge_fit_AMD , new_data = AMD_train %>% select(-AMD.Close))
ridge_train_res_AMD  <- bind_cols(ridge_train_res_AMD , AMD_train %>% select(AMD.Close))
 
# Lasso Regression Training
lasso_train_res_AMD  <- predict(lasso_fit_AMD , new_data =  AMD_train  %>% select(-AMD.Close))
lasso_train_res_AMD  <- bind_cols(lasso_train_res_AMD ,  AMD_train  %>% select(AMD.Close))

# Elastic Net Training
elastic_net_train_res_AMD  <- predict(elastic_net_fit_AMD , new_data =  AMD_train  %>% select(-AMD.Close ))
elastic_net_train_res_AMD  <- bind_cols(elastic_net_train_res_AMD ,  AMD_train  %>% select(AMD.Close ))

# k-Nearest Neighbors Training
knn_train_res_AMD  <- predict(knn_fit_AMD , new_data =  AMD_train  %>% select(-AMD.Close ))
knn_train_res_AMD  <- bind_cols(knn_train_res_AMD ,  AMD_train  %>% select(AMD.Close ))
# Linear Regression Training
lm_train_res_NVDA  <- predict(lm_fit_NVDA , new_data = NVDA_train %>% select(-NVDA.Close))
lm_train_res_NVDA  <- bind_cols(lm_train_res_NVDA , NVDA_train %>% select(NVDA.Close))
 
# Ridge Regression Training
ridge_train_res_NVDA  <- predict(ridge_fit_NVDA , new_data = NVDA_train %>% select(-NVDA.Close))
ridge_train_res_NVDA  <- bind_cols(ridge_train_res_NVDA , NVDA_train %>% select(NVDA.Close))
 
# Lasso Regression Training
lasso_train_res_NVDA  <- predict(lasso_fit_NVDA , new_data =  NVDA_train  %>% select(-NVDA.Close))
lasso_train_res_NVDA  <- bind_cols(lasso_train_res_NVDA ,  NVDA_train  %>% select(NVDA.Close))

# Elastic Net Training
elastic_net_train_res_NVDA  <- predict(elastic_net_fit_NVDA , new_data =  NVDA_train  %>% select(-NVDA.Close ))
elastic_net_train_res_NVDA  <- bind_cols(elastic_net_train_res_NVDA ,  NVDA_train  %>% select(NVDA.Close ))

# k-Nearest Neighbors Training
knn_train_res_NVDA  <- predict(knn_fit_NVDA , new_data =  NVDA_train  %>% select(-NVDA.Close ))
knn_train_res_NVDA  <- bind_cols(knn_train_res_NVDA ,  NVDA_train  %>% select(NVDA.Close ))
# Linear Regression Training
lm_train_res_INTC  <- predict(lm_fit_INTC , new_data = INTC_train %>% select(-INTC.Close))
lm_train_res_INTC  <- bind_cols(lm_train_res_INTC , INTC_train %>% select(INTC.Close))
 
# Ridge Regression Training
ridge_train_res_INTC  <- predict(ridge_fit_INTC , new_data = INTC_train %>% select(-INTC.Close))
ridge_train_res_INTC  <- bind_cols(ridge_train_res_INTC , INTC_train %>% select(INTC.Close))
 
# Lasso Regression Training
lasso_train_res_INTC  <- predict(lasso_fit_INTC , new_data =  INTC_train  %>% select(-INTC.Close))
lasso_train_res_INTC  <- bind_cols(lasso_train_res_INTC ,  INTC_train  %>% select(INTC.Close))

# Elastic Net Training
elastic_net_train_res_INTC  <- predict(elastic_net_fit_INTC , new_data =  INTC_train  %>% select(-INTC.Close ))
elastic_net_train_res_INTC  <- bind_cols(elastic_net_train_res_INTC ,  INTC_train  %>% select(INTC.Close ))

# k-Nearest Neighbors Training
knn_train_res_INTC  <- predict(knn_fit_INTC , new_data =  INTC_train  %>% select(-INTC.Close ))
knn_train_res_INTC  <- bind_cols(knn_train_res_INTC ,  INTC_train  %>% select(INTC.Close ))
# Linear Regression Training
lm_train_res_TSM  <- predict(lm_fit_TSM , new_data = TSM_train %>% select(-TSM.Close))
lm_train_res_TSM  <- bind_cols(lm_train_res_TSM , TSM_train %>% select(TSM.Close))
 
# Ridge Regression Training
ridge_train_res_TSM  <- predict(ridge_fit_TSM , new_data = TSM_train %>% select(-TSM.Close))
ridge_train_res_TSM  <- bind_cols(ridge_train_res_TSM , TSM_train %>% select(TSM.Close))
 
# Lasso Regression Training
lasso_train_res_TSM  <- predict(lasso_fit_TSM , new_data =  TSM_train  %>% select(-TSM.Close))
lasso_train_res_TSM  <- bind_cols(lasso_train_res_TSM ,  TSM_train  %>% select(TSM.Close))

# Elastic Net Training
elastic_net_train_res_TSM  <- predict(elastic_net_fit_TSM , new_data =  TSM_train  %>% select(-TSM.Close ))
elastic_net_train_res_TSM  <- bind_cols(elastic_net_train_res_TSM ,  TSM_train  %>% select(TSM.Close ))

# k-Nearest Neighbors Training
knn_train_res_TSM  <- predict(knn_fit_TSM , new_data =  TSM_train  %>% select(-TSM.Close ))
knn_train_res_TSM  <- bind_cols(knn_train_res_TSM ,  TSM_train  %>% select(TSM.Close ))

Model Accuracies

Root Mean Square Error (RMSE) results:

tibble(Model = c("Linear Regression", "Ridge Regression", "Lasso Regression", "Elastic Net", "k-Nearest Neighbors"),
       AMD = c((lm_train_res_AMD  %>% rmse( AMD.Close, .pred))$.estimate,
                    (ridge_train_res_AMD  %>% rmse( AMD.Close, .pred))$.estimate,
                    (lasso_train_res_AMD  %>% rmse( AMD.Close, .pred))$.estimate,
                    (elastic_net_train_res_AMD  %>% rmse( AMD.Close, .pred))$.estimate,
                    (knn_train_res_AMD  %>% rmse( AMD.Close, .pred))$.estimate ),
       NVDA = c((lm_train_res_NVDA  %>% rmse( NVDA.Close, .pred))$.estimate,
                    (ridge_train_res_NVDA  %>% rmse( NVDA.Close, .pred))$.estimate,
                    (lasso_train_res_NVDA  %>% rmse( NVDA.Close, .pred))$.estimate,
                    (elastic_net_train_res_NVDA  %>% rmse( NVDA.Close, .pred))$.estimate,
                    (knn_train_res_NVDA  %>% rmse( NVDA.Close, .pred))$.estimate ),
       INTC = c((lm_train_res_INTC  %>% rmse( INTC.Close, .pred))$.estimate,
                    (ridge_train_res_INTC  %>% rmse( INTC.Close, .pred))$.estimate,
                    (lasso_train_res_INTC  %>% rmse( INTC.Close, .pred))$.estimate,
                    (elastic_net_train_res_INTC  %>% rmse( INTC.Close, .pred))$.estimate,
                    (knn_train_res_INTC  %>% rmse( INTC.Close, .pred))$.estimate ),
       TSM = c((lm_train_res_TSM  %>% rmse( TSM.Close, .pred))$.estimate,
                    (ridge_train_res_TSM  %>% rmse( TSM.Close, .pred))$.estimate,
                    (lasso_train_res_TSM  %>% rmse( TSM.Close, .pred))$.estimate,
                    (elastic_net_train_res_TSM  %>% rmse( TSM.Close, .pred))$.estimate,
                    (knn_train_res_TSM  %>% rmse( TSM.Close, .pred))$.estimate )
       ) %>%
  kable() %>%
  kable_styling(full_width = F) %>%
  scroll_box(width = "100%", height = "200px")
Model AMD NVDA INTC TSM
Linear Regression 0.0529450 0.492420 0.0000000 0.0948985
Ridge Regression 3.1090870 12.717839 1.0051845 1.5427664
Lasso Regression 2.9645541 9.544530 1.2526970 1.7299049
Elastic Net 0.5595055 5.337853 0.1712020 0.2013565
k-Nearest Neighbors 1.4430606 4.906444 0.1501175 0.5515341

R^2 results:

tibble(Model = c("Linear Regression", "Ridge Regression", "Lasso Regression", "Elastic Net", "k-Nearest Neighbors"),
       AMD = c((lm_train_res_AMD  %>% rsq( AMD.Close, .pred))$.estimate,
                    (ridge_train_res_AMD  %>% rsq( AMD.Close, .pred))$.estimate,
                    (lasso_train_res_AMD  %>% rsq( AMD.Close, .pred))$.estimate,
                    (elastic_net_train_res_AMD  %>% rsq( AMD.Close, .pred))$.estimate,
                    (knn_train_res_AMD  %>% rsq( AMD.Close, .pred))$.estimate ),
       NVDA = c((lm_train_res_NVDA  %>% rsq( NVDA.Close, .pred))$.estimate,
                    (ridge_train_res_NVDA  %>% rsq( NVDA.Close, .pred))$.estimate,
                    (lasso_train_res_NVDA  %>% rsq( NVDA.Close, .pred))$.estimate,
                    (elastic_net_train_res_NVDA  %>% rsq( NVDA.Close, .pred))$.estimate,
                    (knn_train_res_NVDA  %>% rsq( NVDA.Close, .pred))$.estimate ),
       INTC = c((lm_train_res_INTC  %>% rsq( INTC.Close, .pred))$.estimate,
                    (ridge_train_res_INTC  %>% rsq( INTC.Close, .pred))$.estimate,
                    (lasso_train_res_INTC  %>% rsq( INTC.Close, .pred))$.estimate,
                    (elastic_net_train_res_INTC  %>% rsq( INTC.Close, .pred))$.estimate,
                    (knn_train_res_INTC  %>% rsq( INTC.Close, .pred))$.estimate ),
       TSM = c((lm_train_res_TSM  %>% rsq( TSM.Close, .pred))$.estimate,
                    (ridge_train_res_TSM  %>% rsq( TSM.Close, .pred))$.estimate,
                    (lasso_train_res_TSM  %>% rsq( TSM.Close, .pred))$.estimate,
                    (elastic_net_train_res_TSM  %>% rsq( TSM.Close, .pred))$.estimate,
                    (knn_train_res_TSM  %>% rsq( TSM.Close, .pred))$.estimate )
       ) %>%
  kable() %>%
  kable_styling(full_width = F) %>%
  scroll_box(width = "100%", height = "200px")
Model AMD NVDA INTC TSM
Linear Regression 0.9999912 0.9999792 1.0000000 0.9997814
Ridge Regression 0.9696946 0.9862282 0.9677486 0.9431276
Lasso Regression 0.9755071 0.9922683 0.9816290 0.9517890
Elastic Net 0.9990249 0.9975579 0.9990581 0.9990305
k-Nearest Neighbors 0.9936798 0.9979592 0.9992752 0.9930131